skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Kmoch, Joseph"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Introduction: Recent AI advances, particularly the introduction of large language models (LLMs), have expanded the capacity to automate various tasks, including the analysis of text. This capability may be especially helpful in education research, where lack of resources often hampers the ability to perform various kinds of analyses, particularly those requiring a high level of expertise in a domain and/or a large set of textual data. For instance, we recently coded approximately 10,000 state K-12 computer science standards, requiring over 200 hours of work by subject matter experts. If LLMs are capable of completing a task such as this, the savings in human resources would be immense. Research Questions: This study explores two research questions: (1) How do LLMs compare to humans in the performance of an education research task? and (2) What do errors in LLM performance on this task suggest about current LLM capabilities and limitations? Methodology: We used a random sample of state K-12 computer science standards. We compared the output of three LLMs – ChatGPT, Llama, and Claude – to the work of human subject matter experts in coding the relationship between each state standard and a set of national K-12 standards. Specifically, the LLMs and the humans determined whether each state standard was identical to, similar to, based on, or different from the national standards and (if it was not different) which national standard it resembled. Results: Each of the LLMs identified a different national standard than the subject matter expert in about half of instances. When the LLM identified the same standard, it usually categorized the type of relationship (i.e., identical to, similar to, based on) in the same way as the human expert. However, the LLMs sometimes misidentified ‘identical’ standards. Discussion: Our results suggest that LLMs are not currently capable of matching human performance on the task of classifying learning standards. The mis-identification of some state standards as identical to national standards – when they clearly were not – is an interesting error, given that traditional computing technologies can easily identify identical text. Similarly, some of the mismatches between the LLM and human performance indicate clear errors on the part of the LLMs. However, some of the mismatches are difficult to assess, given the ambiguity inherent in this task and the potential for human error. We conclude the paper with recommendations for the use of LLMs in education research based on these findings. 
    more » « less
    Free, publicly-accessible full text available June 1, 2026
  2. Introduction: State and national learning standards play an important role in articulating and standardizing K-12 computer science education. However, these standards have not been extensively researched, especially in terms of their cognitive complexity. Analyses of cognitive complexity, accomplished via comparison of standards to a taxonomy of learning, can provide an important data point for understanding the prevalence of higher-order versus lower-order thinking skills in a set of standards. Objective: The objective of this study is to answer the research question: How do state and national K-12 computer science standards compare in terms of their cognitive complexity? Methods: We used Bloom’s Revised Taxonomy in order to assess the cognitive complexity of a dataset consisting of state (n = 9695) computer science standards and the 2017 Computer Science Teachers Association (CSTA) standards (n = 120). To enable a quantitative comparison of the standards, we assigned numbers to the Bloom’s levels. Results: The CSTA standards had a higher average level of cognitive complexity than most states’ standards. States were more likely to have standards at the lowest Bloom’s level than the CSTA standards. There was wide variety of cognitive complexity by state and, within a state, there was variation by grade band. For the states, standards at the evaluate level were least common; in the CSTA standards, the remember level was least common. Discussion: While there are legitimate critiques of Bloom’s Revised Taxonomy, it may nonetheless be a useful tool for assessing learning standards, especially comparatively. Our results point to differences between and within state and national standards. Recognition of these differences and their implications can be leveraged by future standards writers, curriculum developers, and computing education researchers to craft standards that best meet the needs of all learners. 
    more » « less
    Free, publicly-accessible full text available June 1, 2026